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ABSTRACT 
Classically,  the  invariant  property  of  maximum  likelihood  estimators 
has  been  limited  by  one-to-one  restrictions  on  the  transformation .  This 
thesis  defines  the  Induced  Likelihood  Function  and  develops  a  theorem 
which  may  be  used  to  extend  the  invariant  property  to  estimation  problems 
where  the  one-to-one  restriction  is  dropped.  It  is  shown  that  the  theorem 
is  applicable  to  the  k  dimensional  estimation  problem. 
Theorem: 

If    l)  f  is  a  function  such  that  S  is  mapped  into  E  and 
f(S)  ^  f(©)  for  all  Q  in  S. 

2)  <))  is  a  transformation  such  that  S  is  mapped  into  S 
where  <|)(©)  -  ©*  and  <|)(©)  =  9*  for  all  ©  in  S. 
Define  an  inverse  on  S  such  that  <JT  (©  )  =  ©  and 
^(Q*)  =  ©  for  all  ©*  in  S*. 

3)  g  is  a  function  defined  by  g(©  )  =  f((jf  (©  ))  such 
that  S  is  mapped  into  E 

then    g(©*)  ^  g(©*)  for  all  ©*  in  S*. 
The  writer  wishes  to  express  his  appreciation  to  Professor  P.  W. 
Zehna  for  his  guidance,  assistance  and  the  essential  elements  for  the 
proof  of  the  above  theorem  and  to  Professor  J.  R„  Borsting  for  his  sug- 
gestions and  encouragement. 
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SECTION  I 
INTRODUCTION 

1.1  Statement  of  the  Problem 

The  invariance  property  of  maximum  likelihood  estimation  provides  a 
very  convenient  tool  for  statistical  application.  However,  its  use  is 
somewhat  limited  in  practical  applications  since  the  property  apparently 
has  only  been  shown  to  hold  when  the  transformation  of  the  parameter  space 
is  one-to-one.  This  investigation  evolved  as  a  follow-up  to  one  phase  of 
a  reliability  study  undertaken  by  Captain  W.  J.  Corcoran,  USN,  and 
Dr.  H.  Weingarten  of  the  Technical  Division  of  the  Special  Projects  Office 
of  the  U.  S.  Navy  and  Dr.  P.  W.  Zehna  of  CEIR  (l3)(l4).  This  SP  sponsored 
study  presented  several  estimators  for  the  parameters  of  a  conceptual  re- 
liability model  based  on  the  multinomial  probability  distribution.  One 
of  the  proposed  estimates  was  "like"  a  maximum  likelihood  estimate  (mle) 
in  that  it  was  a  function  of  mle's;  but  since  the  function  was  not  1-1, 
the  estimate  was  not  formally  called  a  mle.  Attempts  to  derive  distribu- 
tion information  concerning  this  estimate  involved  very  complicated  equa- 
tions and  these  difficulties  were  compounded  by  the  fact  that  under  current 
definitions  and  concepts,  maximum  likelihood  estimation  (MLE)  distribution 
theory  was  not  correctly  applicable.  It  was  felt  by  Dr.  Zehna  that  one 
of  the  primary  questions  that  had  to  be  answered  prior  to  further  work  on 
the  model  was,  "Does  the  invariance  property  of  MLE  apply  when  the  func- 
tional relationship  is  not  1-1,  and  if  so,  under  what  conditions?" 


1.2  Purpose  of  the  Study 

The  purpose  of  this  study  is  to  investigate  the  invariance  property 
of  maximum  likelihood  estimation  when  the  transformation  function  is  not 
one-to-one,  and  to  attempt  to  formalize  concepts,  definitions,  and  theorems 
that  are  applicable  in  this  and  similar  situations „ 

1.3  Thesis  Scope  and  Organization 

Of  necessity,  it  is  assumed  that  the  reader  has  a  basic  familiarity 
with  the  theory  of  probability  and  statistics,  and  the  method  and  proper- 
ties of  maximum  likelihood  estimation.  A  brief  review  of  some  of  the  more 
pertinent  concepts  of  point  estimation  along  with  a  summary  of  the  tech- 
nique of  MLE  is  presented  in  chapter  two  to  provide  a  minimal  common  back- 
ground and  to  assure  familiarity  with  the  notation  as  it  is  used  in  later 
discussions.  Appendix  one  contains  a  chronological  key  to  all  notation 
used  and  is  referenced  by  numbers  indicating  the  page  in  the  thesis  on 
which  the  notation  was  originally  introduced. 

In  chapter  three  the  invariance  property  of  MLE  is  discussed  and  con- 
cepts and  theorems  are  developed  which  allow  the  present  theory  to  be  gen- 
eralized and  extended.  Examples  are  liberally  used  to  emphasize  the  points 
under  discussion.  Chapter  four,  in  summary,  attempts  to  indicate  the  poss- 
ible contributions  of  the  thesis,  and  suggests  possible  areas  for  further 
study. 


SECTION  II 
MAXIMUM  LIKELIHOOD  ESTIMATION 


2.1  Estimation;  Basic  Concepts 

The  purpose  of  statistical  estimation  is  to  estimate,  on  the  basis 
of  an  observed  sample,  the  values  of  the  unknown  parameters  of  the  popula- 
tion from  which  the  sample  originated.  Since  1763  when  Bayes  memoirs  were 
published  posthumously,  there  has  been  wide  controversy  and  discussion  con= 
cerning  the  various  estimation  techniques  and  the  properties  of  the  result- 
ing estimates.  Over  the  years  several  of  these  descriptive  properties  or 
characteristics  have  emerged  as  "desirable"  traits  of  estimators o  After 
presenting  some  concepts  and  definitions,  several  of  the  properties  usually 
related  to  maximum  likelihood  estimates  are  briefly  discussed. 


Symbol/Term 


«!•  82' 


parameter 
f(x;S) 

B(x) 

estimator 


f   <> 


Definition 

A  sample  or  outcome  of  observed 
values  of  the  random  variables 

Xl»  X2'  °  '  '  '  Xn 

The  parameters  of  an  experiment  - 
generally  indices  for  some  family 
of  probability  distributions 

A  constant  of  a  probability  dis- 
tribution, generally  unknown  in 
estimation  problems 

The  probability  density  function 
of  the  random  variable  X  with  para- 
meter indexed  by  ©,  denoted  pdf 

The  expectation  of  x 

A  statistic;  a  rule  for  making 
an  estimate  of  a  parameter;  a 
function  of  the  observed  values  of 
the  random  variables.  An  estima- 
tor is  derived  prior  to  sampling. 


Symbol/Term  Definition 

estimate  A  numerical  value  assigned  to  a 

parameter  of  a  distribution  on 
the  basis  of  evidence  from  samples; 
an  observed  value  of  an  estimator „ 
An  estimate  is  made  after  the  samp- 
lingo  (in  this  paper  estimate  will 
imply  "statistical  point  estimate" 
unless  otherwise  indicated.) 

The  distinction  between  the  parameter  and  its  estimate  is  an  import- 
ant one.  The  true  parameter  value  is  fixed  and  unknown .  However,  with 
repetition  of  an  experiment,  the  sample  will  vary,  and  the  estimate  itself 
will  vary  and  will  have  a  probability  distribution.  Estimation  techniques 
are  derived  with  the  assumption  that  a  sample  is  representative  of  the  true 
population,  therefore,  the  parameter  estimate  is  subject  to  sampling  errors0 
The  possible  magnitude  of  sampling  error  is  an  important  consideration  and 
leads  to  interval  estimation  which  is  not  discussed  in  this  paper. 
2.2  The  Method  of  Maximum  Likelihood  Estimation 

The  method  of  moments,  introduced  by  Karl  Pearson  in  1894  was  the  earli- 
est formal  technique  proposed  for  point  estimation.  Since  that  time  many 
estimation  procedures  have  been  devised,  the  best  known  of  which  are  the 
methods  of  minimum  chi  square,  Bayes,  Minmax.  least  squares,  and  maximum 
likelihood.  It  has  been  said  that  in  many  respects  the  introduction  of 
maximum  likelihood  estimation  marked  the  era  of  modern  statistical  theory0 
The  principle  of  maximum  likelihood  was  discussed  by  Gauss  prior  to  1880, 
but  R.  A.  Fisher  formally  developed  maximum  likelihood  estimation  (MLE)  as 
a  technique  in  a  series  of  papers,  the  first  of  which  was  presented  in 
1921  (20). 

D.  A.  Praser,  Statistics,  an  Introduction,  John  Wiley  and  Sons,  Inc., 
p.  224,  1958 


Gauss  had  stated  the  concept  in  the  following  manner0  Assume  a  random  var- 
iable (vector)  X  with  real  values  (x.,  .  .  0  x  )  where  the  pdf  of  X  is  a 
function  of  the  parameter(s)  indexed  by  ©»  Let  ©  have  a  known  or  assumed 
prior  distribution  with  range  a  to  bo  Then  the  posteriori  distribution  of 
©  given  X  =  x  is 

f(e,x) 
f(e|x)  =— r c(x)f(©,x)0 


f  f(e, 
Ja 


x)d9 


Gauss  used  the  mode  of  the  derived  posteriori  distribution  as  an  estimate 

of  ©.  This  value  is  what  is  commonly  known  today  as  the  maximum  likelihood 

2 

estimate,  the  value  of  ©  which  maximizes  the  pdf  of  X  with  respect  to  ©» 

Fisher  in  his  development  derived  what  became  known  as  the  "Likelihood 
Function",  the  product  of  the  population  densities  for  each  value  in  the 
sample.  This  function  is  denoted  L(©)  where  L(©)  p  7Tf(x.;©)  and  is  re- 
garded as  a  function  of  ©  for  fixed  x. »  The  method  of  maximum  likelihood 
estimation  is  defined  by  maximizing  this  function.,  Since  the  logarithim 
is  a  monotonic  increasing  function,  L(©)  and  its  log  are  maximized  by  the 
same  value  of  Q.  This  is  sometimes  convenient  since  manipulation  of  log 
L(©)  is  often  much  easier  than  working  with  the  function  directlyo 

The  procedure  for  determining  the  mle  of  ©  is  as  follows: 

1)  Determine  the  pdf,  f(x;©) 

2)  Determine  L(©)  ■  TT f (x. ;©)  and  express  as  log  L(©)  if  appropriate, 

3)  Determine  a  value  of  ©  which  will  maximize  L(©)„  This  value 
is  usually  found  by  setting  the  derivative(s)  of  the  likelihood  function 
with  respect  to  Q  equal  to  zero  and  solving  the  ensuing  equation(s)  for  the 

2 

E.  L.  Lehmann,  Notes  on  the  Theory  of  Estimation,  University  of  Cali- 
fornia, p.  1-9,  1950 


parameter  value(s)  when  conditions  exist  that  make  this  possible „  If  L(@) 
is  differentiable  and  has  its  maximum  at  an  interior  point  of  the  range  of 
9,  the  point  at  which  L(©)  attains  this  maximum  is  the  mle  of  9,  denoted  9t 


and  the  "Likelihood  Equation"  is  -g-r  log  L(9)   a  =  0o 

L  J  9=9 

Setting  the  derivative  of  a  function  equal  to  zero  and  solving  in  terms 

of  a  parameter  does  not  in  itself  guarantee  a  maximizing  value 0  If  there 
is  any  doubt  as  to  the  authenticity  of  the  solution „  there  should  be  further 
investigation  to  verify  the  underlying  assumptions,  namely  that  the  likeli- 
hood equations  generally  have  only  maximinizing  solutions.  Lindgren  points 

out  that  this  is  usually  the  case  since  L(9)  is  a  product  of  probability 

3 
densities  and  is  usually  bounded  above  and  continuous  in  6. 

2.3  Desirable  Properties  of  Estimators 

There  are  many  ways  that  an  estimator  may  be  chosen .  Hopefully  sta- 
tistical techniques  provide  the  tools  for  choosing  "good"  estimators  <,  To 
help  describe  what  is  meant  by  "good",  several  generally  desirable  proper- 
ties or  characteristics  of  estimators  have  been  defined,.  The  properties 
usually  associated  with  mle's  are  discussed  below „ 

l)  Unbiasedness:  This  property  is  concerned  with  the  distribu- 
tion of  the  estimator.  An  estimator  9(x.,  .  .  .  ,   x  )  for  the  parameter 
9  is  said  to  be  unbiased  if  E(9)  =  9o  Then,  the  bias  of  9,  denoted  b,  is 
b  ■  E(9)  -  9.  Although  unbiasedness  is  a  desirable  trait,  it  is  by  no  means 
paramount.  Figure  1  shows  the  densities  of  three  estimators  of  9.  Although 
9.  and  92  are  both  unbiased,  9,  is  obviously  the  best  estimator  of  the  three 
even  though  it  has  positive  or  right  bias.  It  is  apparent  that  unbiasedness 

B.  S.  Lindgren,  Statistical  Theory,  the  Macmillian  Company,  p.  222, 
I960. 


considered  alone  does  not  guarantee  a  good  estimator .  The  distributions, 
variance,  and  sample  size  all  modify  the  bias0 

2)  Consistency;  Consistency  is  a  large  sample  property  of  an 
estimator.  An  estimator  is  said  to  be  consistent  if  its  probability  dis- 
tribution concentrates  on  the  true  parameter  value  as  the  sample  size  be- 
comes infinite.  That  is,  ©  is  consistent  if  P(  |©  -  ©|<<£)  =  1  as  n—»co 
for  every  6>  0. 


Figure  1.  Density  Functions  of  Three  Estimators  of  the  Parameter  Q 

An  unbiased  estimator  is  consistent  if  its  variance  approaches  zero  as  the 

sample  size  approaches  infinity. 

There  may  be  many  consistent  estimators  of  a  parameter.  Therefore ,  as 

with  unbiasedness,  the  criterion  of  consistency  alone  does  not  guarantee  a 

useful  estimator,  although  consistency  is  usually  a  desirable  propertya 

3)  Efficiency:  Efficiency  provides  a  criterion  for  comparing 

unbiased  estimates  of  a  parameter.  As  mentioned  previously,  once  it  is 

4 
A.  M.  Mood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 

Book  Company,  Inc.,  p.  149,  1950. 

H.  Cramer,  Mathematical  Methods  of  Statistics,  Princeton  University 
Press,  p.  351,  1946 


known  that  the  distribution  of  the  estimator  is  centered  on  the  true  value 
of  the  parameter,  the  variance  of  the  distribution  becomes  an  important 
consideration,,  If  ©.  and  ©2  are  estimates  of  ©,  the  "Relative  Efficiency" 
of  ©x  to  ©2  is  (A^k^)  x   10C#  where  k±  -  E(Q±   -  ©)20  If  the  ratio  k^/k^ 
is  greater  than  one,  ©.  may  be  considered  a  more  efficient,  and  therefore 

/V  As  /V 

perhaps  a  more  suitable  estimate  than  ©2<>  If  ©,  and  ©2  are  unbiased  esti- 
mates of  ©,  then  ApA1  is  a  ratio  of  variances  and  will  take  on  its  highest 
values  when  ©,  is  an  estimate  with  minimum  variance .  R<>  A„  Fisher  proposed 
that  the  estimator  having  a  minimum  variance  in  large  samples  should  be 
called  "Efficient".  This  idea  was  formalized  by  a  definition  very  similar 
to  the  following: 

Definition:  ©  is  said  to  be  an  efficient  estimator  of  ©  if: 

1)  VN(©  -  ©)  approaches  N(0,CT  )  as  N  approaches  infinity., 

2)  for  any  other  estimator  ©  for  which  VN(©  -  ©)  approach- 

2*     2*     2  * 

es  N(0,CT  )  ,  (T  2CT  ,  (The  efficiency  of  ©  is 

(cr2/cr2*)  XlOO^o) 

The  Cramer-Rao  inequality  may  be  used  to  find  the  limiting  value  of 

mean  square  deviations  (variances  for  unbiased  estimators) o  Efficient  es- 

7 
timators  are  consistent  but  are  not  necessarily  unbiased  except  in  the  limit e 

4)  Sufficiency:  An  estimator  is  sufficient  if,  "it  contains  all 

Q 

the  information  in  the  sample  regarding  the  parameter" 9     that  is,  it  utilizes 
all  of  the  pertinent  information  in  the  sample o 

6ibid,  p.  477 

7 
Ao  Mo  Mood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 

Book  Company,  Inc»,  p0  151»  1950 
8ibid 


Definition:  9  is  a  sufficient  estimator  of  ©  if,  given  the 

A,  . 

value  of  ©(x  ,  «  .  •  ,  x  ;,  the  conditional  distribution 
is  independent  of  the  parameter  ©<> 
In  many  situations  the  evaluation  and  manipulation  of  conditional  dis- 
tributions is  very  difficult,  however,  the  following  criterion  allows  de- 
termination of  sufficiency  by  discerning  if  the  joint  density  function  can 
be  properly  factored., 

Theorem  2.1:  An  estimator  is  sufficient  if  and  only  if  the  pro- 
bability  density  function  can  be  factored  into  two  functions 
g  and  h,  where  h  is  dependent  on  the  estimator  and  the  parame- 
ter and  g  is  independent  of  the  parameter „  That  is,  ©  is  suf- 
ficient if  7Tf(Xi,  ©)  =  g(x±9   o  .  .  ,  xjS)  h(©;©)0 

If  sufficient  statistics  exist,  it  has  been  shown  that  they  will  be  solu- 

9 
tions  of  maximum  likelihood. 

5)  Invariance:  This  property,  which  is  to  be  discussed  at  length 
in  chapter  three  is  usually  associated  with  maximum  likelihood  estimation,, 
The  property  implies  that  if  the  mle  of  ©  is  ©  and  certain  regularity  condi- 
tions are  satisfied  a  mle  of  <j)(©)  is  <j>(©)o  That  is,  a  mle  of  a  function  of 
©  is  simply  the  function  with  the  value  of  ©  substituted  for  ©<> 

Maximum  likelihood  estimates  are  usually  biased,  consistent,  efficient, 
invariant,  and  a  function  of  a  sufficient  statistic  if  one  exists 0  Under 

A 

fairly  general  regularity  conditions  ©  is  asymptotically  normally  distri- 
buted, has  finite  variance  with  limiting  value  =  l/l(©)  where 

q 

Ro  A.  Fisher,  Contributions  to  Mathematical  Statistics 9  John  Wiley 

and  Sons,  Inc.,  p.  224,  1958 


1(9)  ■  nE<[^  log  f(X,e)l  >,  and  therefore  is  asymptotically  Ffficiento10 
No  other  Asymptotically  normally  distributed  estimator  can  have  smaller  var~ 

iance0    If  an  efficient  statistic  exists  for  small  samples  (i0e0  with  min- 

12 
imum  variance),  a  mle  with  bias  correction,  if  necessary,  will  be  ito 

This  follows  from  the  fact  that  if  there  is  an  unbiased  efficient  estimate, 

13 
the  maximum  likelihood  method  will  produce  ito    Similarly,  if  there  is  a 

sufficient  statistic  for  estimating  the  true  parameter  value,  any  solution 
of  the  likelihood  equation  will  be  a  function  of  it0 

Prom  the  preceeding  summary,  it  can  be  seen  why  MLE  has  become  a  favored 
and  often  used  technique  in  the  field  of  statistical  estimation .  Although 
each  of  the  estimation  techniques  has  its  strong  points  and  proponents, 
(Pearson  hotly  defended  the  method  of  moments  as  "best"  (44)),  the  mle  is  gen- 
erally expected  to  exibit  more  of  the  desirable  properties  of  a  point  esti- 
mator. Still,  for  certain  instances,  depending  on  the  situation  and  problem 
at  hand,  the  use  of  other  estimation  techniques  may  seem  more  logical  and/or 
be  easier,,  In  fact,  for  certain  distributions  different  techniques  may  pro- 
duce the  same  estimate  although  generally  they  are  different >  The  methods 
of  moments  and  maximum  likelihood  produce  the  same  estimates  for  the  parame- 
ters of  the  normal,  poisson,  and  binomial  probability  distributions 0  ^ 

Ho  Cramer,  Mathematical  Methods  of  Statistics,  Princeton  University 
Press,  pp0  500-506,  1946 

ilAo  M.  Mood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 
Book  Company,  Inc.,  p0  160,  1950 

12 

R.  Lo  Anderson  and  To  A«  Bancroft,  Statistical  Theory  in  Research, 

McGraw-Hill  Book  Company,  Inc0,  p»  102,  1952 

Bo  W.  Lindgren,  Statistical  Theory,  The  Macmillian  Company,  p0  226, 
I960 

So  So  Wilks,  Mathematical  Statistics,  Princeton  University  Press, 
Po  146,  1943 

10 


2.4  Review  of  the  Literature 

A  search  of  literature  failed  to  yield  any  new  and  significant  informa- 
tion  concerning  the  application  of  the  invariance  principle  to  MLE0  Of  the 
college  level  statistics  texts  reviews p  17  contained  sections  on  point  esti- 
mation 0  Of  these,  only  five  discussed  invariance  as  associated  with  maxi- 
mum likelihood  estimation,  and  eacn  of  these  was  restricted  by  the  condition 

that  the  functional  relationship  be  single  valued  or  one-to-one „  It  is  in- 

15 
teresting  to  note  that  Mood   in  his  discussion  of  the  property  of  invariance 

as  applicable  to  MLE  states  that,  w  „  <>  .  if  i  is  the  maximum -likelihood  es- 
timate for  ©,  and  if  u(©)  is  any  single-valued  function  of  ©,  then  u(§)  is 
the  maximum-likelihood  estimate  for  u(9)0"  However,  in  his  proof  of  this 
property,  it  is  implicitly  assumed  that  an  inverse  function  ©  =  v(u)  is  de- 
fined and  he  shows  that  the  mle  for  u  is  the  value  of  u  that  maximizes 
L(v(u))0  Then,  in  addition  to  the  necessity  of  having  a  single-valued 
function  for  the  property  to  be  applied  as  described  by  Moodp  the  inverse 
function  must  also  exist «  But  even  when  the  function  is  single-valued  (but 
many  to  one,  of  course)  there  are  many  ways  to  define  an  inverse  function* 
As  shall  be  seen  below,  special  care  must  be  exercised  in  defining  such  an 
inverse o  It  also  illustrates  one  of  the  situations  motivating  this  investi- 
gation, namely j,  that  discussions  of  the  invariant  property  are  often  incom- 
plete in  the  above  sense., 

A  conspicious  absence  of  literature  concerning  this  property  could  be 
construed  to  indicate  either  that  the  problem  is  so  trivial  that  it  is  un<= 
necessary  to  record  methods  of  application,  or  that  the  problem  is  of  no 
practical  or  theoretical  interest „  Preliminary  investigation  of  the  problem 

Ae  Mo  Mood,  Introduction  to  the  Theory  of  Statistics ,  McGraw-Hill 
Book  Company,  Inc.,  p0  159,  1952 
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at  hand  leads  to  rejection  of  both  alternatives 0  The  SP  study  mentioned  in 
the  opening  pages  of  this  paper  is  just  one  of  many  indicators  that  the  pro- 
blem is  not  trivial.  Also,  it  provides  a  real  practical  application  of  the 

invariant  property  of  MLS  in  an  area  not  adequately  covered  by  present 

concepts  and  definitions. 
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SECTION  III 
A  NEW  APPROACH 

3«1  The  Induced  Likelihood  Function 

We  have  seen  that  previous  applications  of  the  invariance  principle  to 
the  method  of  maximum  likelihood  estimation  have  been  restricted  so  as  to  pro- 
vide a  1-1  relationship  between  the  domain  and  range  of  the  functions  of  the 
parameters  being  estimated „  Below  a  theorem  is  stated  as  it  usually  occurs 
in  the  1-1  estimation  problem B 
Theorem  3ol: 

If  l)  fsS  — ►  E.  (read,  the  function  f  map  S  into  E1) 

2)  4>:  S^s\  therefore  f1;  S*^S 

3)  g:  S* — >EX  defined  by  g(©*)  =  f(fl(9*)) 

4)  there  exists  an  element  of  So  denoted  6 ,   such  that 
f  (©  )  ^  f  (©)  for  all  ©  in  S 

then    g(©*)  ^  g(©*)  for  all  ©*  in  S* 

Proof:   1)  Let  9  be  an  element  of  S  „  Then  <j)  (©  )  is  an 
element  of  S  and 

2)  g(©*)  =  f«rv»  *  f(©0)  =  t(f\Q*0))  =  *(©*). 

i.e.  g(©*)  ^  g(©*)  for  all  ©*  in  S*  and  if  ©q  is 

* 
unique p  strict  inequality  holds  and  9  is  unique0 

So  far  both  the  theorem  and  notation  are  conventional  and  application 

of  the  theorem  to  maximum  likelihood  estimation  is  as  follows0  Let  S  be 

the  parameter  space  of  the  estimation  problem^,  E^^  is  the  real  line0  The 

likelihood  function  L(©)  is  such  that  Ls  S  — >  E  and  L(©;  ^  L(©)  for  all 

©  an  element  of  So  Suppose  there  exists  a  function  <|)  such  that  <();  S  °  ^  S 
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so  that  (fT1:  S*  ^^  S.  Define  the  "Induced  Likelihood  Function , 

M(0  )  ■  L(<t>~  (©  )),  the  likelihood  function  induced  on  S*o  We  now  have  the 

essential  elements  to  apply  theorem  3°lo 

1)  L;  S- 

20  <):  S^S 


1  [\ 

* 


3)  M:  S »  B. 

4)  ©  is  the  value  of  Q  such  L(©)  2s  L(©)  for  all  9  an  element  of  S 
Therefore  if  ©*  »  (j)(©),  by  theorem  3.1  M(©*)  ^  M(©*)  for  all  9*  in  S*  and 
a  mle  of  ©  is  ()>(§).  If  »  is  unique,  then  ©  is  unique0 

Although  it  becomes  apparent  with  application ,  let  it  be  emphasized  at 
this  point  that  the  concept  of  the  induced  likelihood  function  (ILP)  and 

the  manner  in  which  it  is  defined  is  a  most  important  element  of  the  appli= 

* 
cation  of  the  theorem.  A  new  likelihood  function  is  defined  on  S  and  the 

mle  is  the  parameter  value  in  S  which  maximizes  this  new  function 0 

Prior  to  looking  at  situation  in  which  the  1-1  condition  is  dropped , 

consider  the  following  interesting  example  which  emphasizes  the  importance 

of  the  definition  of  the  new  likelihood  function  on  the  transformed  parameter 

* 
space  S  .  Let  all  of  the  essential  conditions  of  theorem  3<>1  hold  and  let 

* 
S  be  contained  in  S0  The  theorem  still  applies  and  along  with  conventional 

MLE  procedures  produces  ({)(©*)  as  the  mle  of  ©  „  That  is,  the  MLE  procedure 

on  S  is  carried  out  as  usual  and  produces  9,  a  mle  of  ©0  However,  in  this 

case  L(©)  is  defined  not  only  on  S  but  on  S  as  wello  What  happens  when 

the  likelihood  function  is  restricted  to  S  ?  Naturally,  it  is  not  expected 

that  the  restricted  mle  will  always  be  the  same  as  that  produced  in  the  un= 

* 
restricted  case  since  the  unrestricted  estimate  may  not  be  a  member  of  S  o 

However,  the  interesting  fact  is  that  0(©)  is  not  necessarily  the  restricted 

mle  even  when  ©  is  an  element  of  S  .  As  an  example  consider  the  exponential 
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distribution. 


1)  Let  f(x;©)  =  ©e"^x  for  ©>0 

2)  L(e)  =  7Tf(xi5©)  =  ©Vnie 


L1  (©)  =  n©11"1  e"11^  (l-x©) 

8.4 

z 


3)  Let    *  =  <)>(©)  -  -jfj  »  80  that  9  =  if1  (  A)  =  ^- 

_(    *    )X 
and  g(x;>\)  -  (y^-)e     X~A       forO<X<l 

4)  Let  M(A)  =  L(©|0<©<1)  for  the  restricted  estimation  problemo 
Then  §  =  —  for  x  ^  1  and  is  undefined  for  x  <  lo 

5)  However,  if  M(X)  =  L((l)~1(A))=  Lfcj^Ot  all  of  the  essential  con- 
ditions  of  theorem  3«1  are  fulfilledo 

1)  L(©):     S ^ 

2)  (()(©):     S^S* 

3)  M(X):     S* >  B,  is  defined  by  M(  X )  -  LO^OO) 

4)  §  is  the  value  of  ©  such  that  L(§)  ^  L(©)  for  all  ©  in  So 
Therefore  by  theorem  3«1  M(A  )  is  maximized  by  A  ■  <j)(©)  =  --£  =  TJ=  a1"1 
the  restricted  mle  (■=  for  x  ^  l)  is  not  equal  0(i)  =  y-r  °  Had  S  not  been 
contained  in  S,  the  defining  of  the  ILP  would  be  absolutely  necessary  since 
L(©)  would  have  no  meaning  on  S  „ 

Taking  note  of  the  use  of  the  1-1  property  in  conventional  maximum  like- 
lihood estimation,  it  is  seen  that  the  assumption  that  (J)  is  1-1  is  used  only 
in  defining  M(©  )  as  a  single  valued  function  If  (p  is  not  1-1,  how  may  the 
MLE  problem  be  handled?  As  before,  the  key  concept  is  the  characterization 
of  the  new  likelihood  function  and  it  can  be  shown  that,  with  proper  defini- 
tion of  the  ILF,  it  is  still  maximized  at  (J>(©)° 
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Consider  the  case  where  f  :  S *  E  and  <|)  s  S  5°-$  S9   that  is>  the 

function  is  exaustive  but  not  necessarily  1-1 0  The  ILF9  the  likelihood  func= 
tion  induced  on  S  is  defined  in  the  following  manner c  If  L(§)  ^  L(@)  for 

all  ©  an  element  of  S,  then  let  6  be  any  value  of  (J)(©)0  Using  the  Axiom  of 

*  A-l/  *\    *• 

Choice,  if  necessary,  define  an  inverse  on  S  such  that  (J)  (©  )  =  ©  and  for 

♦     *  .-i,  *x 
any  other  Q  in  S  ,  f  (©)=©  where  ©  is  any  element  of  3  such  that 

A/  \    *        x-1   * 

(J)(9)  =  ©  .  Then  <p  :  S >  S.  Now  theorem  3d  can  be  extended  and  stated 

in  a  more  general  form0 

Theorem  3»2: 

If  1)  f :  S ►  E1   and  f(©)  ^  f (©)  for  all  ©  in  S 

2)  ft  S  «**  S*      (j>(§)  =  ©* 

„i   ♦ 
and  (J)  :  S -»  S  defined  as  above 

3)  g:  S* »  Ex  defined  by  g(©*)  =  fC^©*)) 

then    g(©*)  ^  g(©  )  for  all  ©  in  S 


Proof: 


.      *  * 

1)  Let  9  be  an  element  of  S 


2)  g(©*)  =  f(4fV))  =  f(e)  ^  f(8)  =  f(r  («0))  =  «(©*) 

Thus,   g(©  )  ^  g(©  )  for  all  ©  an  element  of  S 

In  the  estimation  problem  let  M(©*)  =  L((j)"1(©*))0  Then  M(©*)  i  M(©*) 
so  M(©  )  is  maximized  by  ©  =  <K©)°  The  mle  of  ©  is  (j)(©)  oust  as  in  the 
1-1  estimation  situation.  The  maximization  of  M(©  )  may  not,  in  effect 9 
have  been  over  all  the  elements  in  S  since  <(f  is  not  onto  S,  but  it  has 
taken  place  over  the  set  containing  ©  which  is  the  essential  factoro 

Having  repeatedly  emphasized  the  importance  of  the  definition  of  M, 
the  ILF,  it  seems  reasonable  at  this  point  to  acknowledge  the  fact  that 
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there  may  be  many  ways  to  define  (j)~  and  Mo  In  some  cases  the  definitions 
may  be  such  that  M  is  not  maximized  at  (})(©)  but  this  is  not  necessarily 
brought  on  by  dropping    the  1-1  restriction  and  in  fact  these  same  remarks 
apply  even  in  the  1-1  case0  In  the  restricted  exponential  estimation  example , 
which  was  1-1,  two  likelihood  functions  were  defined  on  S  and  one  was  not 
maximized  at  §{Q)° 

Although  the  term  "likelihood  function*1  has  been  used  extensively  in 
theoretical  statistics  for  quite  a  number  of  years 9  it  appears  that  the  term 
may  be  used  rather  loosely  unless  more  emphasis  is  placed  on  the  definition 
in  a  given  problem.  It  is  suggested  that  the  notion  of  ILF  may  be  an  idea 
which  will  help  to  emphasize  this  point » 
3.2  Examples  of  the  Application  of  Theorem  3o2  and  the  Induced  Likelihood 

Function  to  Maximum  Likelihood  Estimation 

3.2.1  Geometric  distribution 

1)  Let  f(x;9)  =  O(l-O)**1         0  £  ©  ±  1 

2)  l(©)  =  ©n(i-©)n(^l) 

L'(©)  =  n©11"1  (l-9)n*~n  -  ©n  ntx-lKl-©)"5"11-1 

0  =  n©*"1  {l^f™-1     [l-S-(x-l)3] 

©=^ 
x 

10        for  0  ^  ©  ^  J 

3)  Let  *    =  <()(©)  ml 

1  1-  ©    for  i  *  ©  ^  1 

.'•    4>  s    fo,l] ►  [o,tJ     and  is  not  1-1 

J    *    if  x  *  2 

4)  Define  <fl{  A  )  =  ©  =< 

k-A    if  x  ^  2 

•••  4T1.    [o.±]  — >  [o.i] 
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5)  Let  M(A)  =L((|)"1(A)) 


Therefore  by  theorem  3°  2 


L(  A )   if  x  *  2 
L(l-A)  if  x  £  2 


©  =  =   if  Ii2 


A  =  «3)  =  <  A   x  , 

1-9  =  1-  I  if  x  £  2 

6)  Checking  the  results  directly 

a.   -i 
for  x  ^  2     M(  A  )  =  L(A  )   therefore  >  =  ~ 

for  x  ^  2     M(  A)  =  L(l->) 

m(a)  =  (i-A)n  [i-(i-;0]  n(*=l) 

M'(  A)  =  -nd-A)^1   AnU=l)  +  (1-A)n  nix-l)™-*'1 

0  =  n(l-A)n-1  A111"11"1  [-A4-(l=A)(x-l)] 

1  =  x(l-A) 

*     1 

3. 2.2  Normal  Distribution 

2 
1)     Let  f(x;©)  =     y2T]-    e"^x"^'     for  =  oo<  ©  <  oo 

0  -  Jjc±  -  n§ 

S  =  x 
3)     Let   A    =  <|>(9)  =  ©2 


.'.     (()  s  B.  *  [o,oo]     and  is 


not  1-1 


18 


"VA    ifiio 
4)     Define  (|f     ( A  )  =  9  =  /     . — 

l^V^    if  x  <  0 


■-     f1:      [<V 


-1/ 


5)     Let  M(A)  =  L(<T  (A))  = 


L(V^)       if    ZH 


L(-Va")     if     x<  0 
Therefore  by  theorem  3»2 

A  =  (K8)  =  a2  =  x2 

6)  Checking  the  results  directly 


ifxiO,     M(  A  )  =  l(VT)  =  (^f )"  2       e~^(xi  -V*>' 


A 


and  V  A    ■  x       /.    A  - 


_2 


if  x  <  0,  M(  A  )  =  L(-V*  )  =  (^r  2       e^  <xi  +"^")2 

r  (  a  )  =  -<-£-)-  f    e^^i  +^">2  [  I  (.  +V5T)] 


VT  =  -x  .*.    A   = 


0  =  Zx,   =nVX 

3.2.3  Binomial  Distribution 

1)  Let  f(x;9)  =  9X(l  -  9)1_X        x  =  Op  1,  for0<9<l 

2)  L(9)  =  9nX  (1  -  9)n(l-x) 

L'(9)  =  nx^"1  (1  -  9)n'nx  -n(l  -  x)9nx"  (l-of'1*"1 
=  n9nX-1  (1  -  9)n-n*-1  [x(l  -  9)  -  9  +  9x] 
0  =  x(l-S)  -  9  +  §x 
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3)     Let   7s  =  (J>(©) 


29  for  0  <  9  £  ± 


2  -  20  for  |<©<1 
•*«      $  :     (0,   1)  »    (Op   1)     but  is  not  1-1 


\%      if      *  *  T 

4)     Define  (j)-1  (  A )  =  9 

^      if      f>l 


•'•      f1   :      (0,   1) 


(0,   1) 


4*8) 


.   LiA)       if      3c  ^  i 

5)  LetM(X)  =L(f1  (*))  =/         2 

L(^L)       if       x>* 

Therefore  by  theorem  3„2 

29  =  2x  if  x  £  ^ 

2  -  29  =  2(l-x)  if  x  >  i 

6)  Checking  the  results  directly 

Itx*h     M(  A)  =  L(f )  =  (^)n*  (1  -  A)*(l-*) 

■•(X)  =  nxC^)11^1  (1  -  -^)n-nl    [  x(l  -A)  -  -|<1  -  x)] 
.a 

X.  2x 

if  x>  i,   KM  -  tf^>  -  C^)"5  [  i  -(%*>]  n(l~5) 

■•( M  -  ns^r-1  [  i-^)]"3  [«1-  ^A)-(^)(l.^ 

0-x-(^) 

A  =  2  -  2x  =  2(1  -  x) 
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3.3  The  Multidimensional  Estimation  Problem 

At  this  point,  it  seems  logical  to  consider  how  the  theorem  applies  in 
the  estimation  problem  with  a  multidimensional  parameter  space 0  All  examples 
considered  to  this  point  have  been  one=dimensional 0  However^  since  restric- 
tions on  dimensionality  of  the  parameter  space  do  not  occur  in  the  theorem 
or  its  proof,  it  follows  that  the  theorem  applies  to  the  multidimensional 
estimation  problem.  Let  ©  =  (©..,  ©2»  „  .  o,©,  )«,  If  ©  is  multidimensional, 

S\  ,  A      A.  A  v 

then  so  is  ©  and  the  components  \Q.t   ©_»  »  .  „  9   ©,)  are  said  to  be  the 
joint  maximum  likelihood  estimates  of  the  corresponding  ©. „ 

Consider  the  following  example  of  the  normal  distribution  withx  a  ©  „ 
CT2  =  ©2  and  k  =  2. 

1)  f  (x;©)  =  f  (x;  ©. ,  ©J  = 

rr©2 


2)  L(©)  =  (^)-  2  (02r  2  e  -> 


\      =  *  "  A 
/\     1  V  /     -\2   02   ^2 

©2  =  -  L  (xt  -  x)  =  s  .  cr 

2  V1 
3)  Let  X.  -  (\.  *2)  -  <t>(©)  =  ♦(©!»  ©2)  -  (<V  ©  > 


(|)  is  not  1-1 
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4)     Define  flQD  =  (©1>  ©2)  = 


CV\*  i^-)     if  x  *  0 


-1/ 


5)     Define  M(A)  =  L(<T  (A) )  = 


Therefore  by  theorem  3«2 


2  V1 


A  -  KD  -  ♦(«!.  62)  =  (§',  -*-) 


6)  Checking  the  results  directly 

if  Z  *  0,       M(A)  =  L(V^,  ^=) 


i-v 


{JL-Klft.^] 


\  =  X     =ei 


& ■  &  [ K " f 10* <**:> " ^ ~VV2  c^r1] 


8^2      ^^2 


F~x2 


0  = 


<ste 


+    L  — — , 


<i^>2 


1-  A2         n  1 


s2-i       V1 
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similarity  if  x  *  0 

In  some  situations  we  may  desire  to  estimate  only  a  portion  of  the  0„ o 
It  should  be  noted  that  even  though  the  estimates  of  only  certain  components 
of  9  are  desired,  it  may  be  necessary  to  estimate  the  remaining  parameters 
since  the  maximizing  values  for  the  desired  set  usually  depend  on  the  remain= 
ing  parameters.    This  characteristic  is  demonstrated  in  the  example  just 
completed  where  the  mle  of  the  variance  depends  on  the  mle  of  the  mean0 

In  estimating  only  certain  of  the  components  of  9  when  the  remaining 
parameters  are  unknown,  theorem  3<>2  is  appliedo  However,  if  some  of  the  re<= 
maining  parameters  are  known,  then  the  problem  is  quite  different  and  the 
dimension  of  the  parameter  (estimation)  space  is  reduced  by  one  for  each 

known  parameter  value.  The  problem  of  estimating  the  variance  of  a  normal 

2 
distribution  with  parameters  M.   =  ©1  and  cr  =  ©2  serves  to  illustrate  this 

point. 

2 
Case  I  :  A  »  CT  unknown 

a    _    <*     2 
We  have  seen  that  9  =  x  and  92  =  S  0  In  this  case,  the  parameter 

space  is  two-dimensional,  a  half-plane .  That  is  L  :  S  - — >  E.  where  S  is 

\   x  (O,oo)0 

2 
Case  II  :    xi  known,   CP    unknown 


In  this  case  f  (x;9)  =     /  e"-*        9?       0     Since  M   is  known,  L  is 

"  2Fe2  /sir  2 

a  function  of  9p  only  and  it  is  well  known  that  92  -  —  L(x^  -  JUL  )   .     In 

this  case  the  problem  is  no  longer  to  estimate  a  component  of  a  two-dimen- 
sional 9,  rather  we  have  a  new  one-dimensional  estimation  problem  where  S 


is  a  subset  of  E, . 
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Note  that  Case  I  produced  ©2  =  —  Z^vX-i  ~  x)     *&&  Case  II  produces 

A.  1  v^  /         \2 

©2  =—  ^  Cx.  -/<)  ,  usually  quite  different  resultSo  These  differences^ 
however,  are  not  due  to  the  application  of  the  theorem,  but  result  from  the 
fact  that  the  two  likelihood  functions  are  different o 
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SECTION  IV 
SUMMARY 

4.1  Summary  of  Findings 

The  objective  of  this  study  was  to  investigate  and  formalize  concepts 
and  definitions  that  would  allow  the  invariant  property  of  MLE  to  be  extend- 
ed beyond  the  usually  assumed  1-1  estimation  situation  The  induced  likeli- 
hood function  was  introduced,  and  it  has  been  shown  that  by  properly  defin- 
ing the  ILF,  theorem  3«2  provides  the  tool  for  applying  the  invariance  prin- 
ciple in  the  estimation  problem  with  a  transformation  which  is  not  1-1 o 
The  theorem  was  shown  to  be  equally  applicable  in  the  1  or  k  dimension  esti- 
mation situation . 

In  the  development  of  theorem  3°2  it  has  been  strongly  emphasized  that 

the  power  of  the  technique  lies  in  the  defining  of  the  new  likelihood  func- 

* 
tion,  the  likelihood  function  induced  on  S  „  It  is  felt  that,  in  the  pastP 

not  enough  emphasis  has  been  focused  on  this  induced  likelihood  function0 

4.2  Proposed  Areas  for  Further  Study 

This  study  has  not  attempted  to  investigate  the  distribution  theory  re- 
lated to  the  mle's  S  =  (j>(©)  derived  using  the  ILFo  Certainly ,  it  is  im- 
portant to  know  if  present  mle  distribution  theory  is  still  applicable  in 
the  unrestricted  estimation  situation .  Therefore ,   it  is  suggested  that  an 
area  which  presents  fertile  ground  for  study  is  mle  distribution  theory  in 
the  new  situations  covered  in  this  study „ 

The  examples  presented  in  this  investigation  are  simple  and  are  in- 
tended merely  to  acquaint  the  reader  with  the  proposed  use  of  the  theorem 
and  the  ILF.  It  is  hoped  that  this  study  has  generated  reader  interest 
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which  will  result  in  application  of  the  induced  likelihood  function  and 
the  associated  theorem  in  a  wide  variety  of  estimation  situations c 
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APPENDEX  ONE 

SYMBOLS  AND  ABBREVIATIONS 

Definition  Pag© 

Maximum  likelihood  estimate  1 

one-to-one  1 

maximum  likelihood  estimation  1 

observed  value  of  random  variable  X.  3 

l 

index  for  i   parameter  3 

probability  density  function  3 

probability  density  function  3 

the  expectation  of  x  3 

a  sample  or  observed  outcome  3 

the  conditional  pdf  of  Q   given  X=x  5 

the  likelihood  function  5 

an  estimator  6 

the  function  f  is  such  that  it  maps  13 
S  into  E 

the  real  linep  Euclidean  l~space  13 

the  function  (j)  is  such  that  it  maps  13 
S  onto  S  (onto  implies  "exaustive") 
and  is  1-1 o 

the  induced  likelihood  function  14 

the  induced  likelihood  function  14 

the  closed  interval  0,1  17 

the  half-closed  interval  0,  1  18 

the  open  interval  09   1  18 
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Fage  on  which  symbol  originally  was  introduced 
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